计算机与现代化 ›› 2010, Vol. 1 ›› Issue (8): 5-7.doi: 10.3969/j.issn.1006-2475.2010.08.002

• 算法设计与分析 • 上一篇    下一篇

并行聚类算法的设计与研究

孟海东,杨彦侃   

  1. 内蒙古科技大学信息工程学院,内蒙古 包头 014010
  • 收稿日期:2010-04-06 修回日期:1900-01-01 出版日期:2010-08-27 发布日期:2010-08-27

Design and Research of Parallel Clustering Algorithm

MENG Hai-dong, YANG Yan-kan   

  1. School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
  • Received:2010-04-06 Revised:1900-01-01 Online:2010-08-27 Published:2010-08-27

摘要:

在处理海量数据集时,由于单台计算机的处理能力有限,利用传统的聚类算法难以在有效的时间内获得聚类结果。在基于密度和自适应密度可达聚类算法的基础上,提出一种并行聚类算法。理论和实验结果证明该算法具有接近线性的加速比,能够有效地处理大规模的数据集。

关键词: 并行聚类, 海量数据, 集群

Abstract:

During dealing with massive data sets, a single computer’s power is limited. The traditional clustering algorithms are difficult to obtain the results in the short time. To overcome these problems, a new parallel clustering algorithm is presented according to the analysis of clustering algorithm based on density and adaptive densityreachable. Theoretical analysis and experimental results demonstrate that the algorithm is nearlinear speedup ratio, and can handle the massive data sets effectively.

Key words: parallel clustering, massive data sets, cluster computer